NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimal activation of halting multi‐armed bandit models

https://doi.org/10.1002/nav.22145

Cowan, Wesley; Katehakis, Michael N; Ross, Sheldon M (October 2023, Naval Research Logistics (NRL))

Abstract We study new types of dynamic allocation problems theHalting Banditmodels. As an application, we obtain new proofs for the classic Gittins index decomposition result compare Gittins (Journal of the Royal Statistical Society, Series B, 1979, 41, 148–177), and recent results of the authors in Cowan and Katehakis (Probability in the Engineering and Informational Sciences, 2015, 29, 51–76).
more » « less
Full Text Available
EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET

https://doi.org/10.1017/S0269964818000529

Cowan, Wesley; Katehakis, Michael N (July 2020, Probability in the Engineering and Informational Sciences)

The purpose of this paper is to provide further understanding into the structure of the sequential allocation (“stochastic multi-armed bandit”) problem by establishing probability one finite horizon bounds and convergence rates for the sample regret associated with two simple classes of allocation policies. For any slowly increasing functiong, subject to mild regularity constraints, we construct two policies (theg-Forcing, and theg-Inflated Sample Mean) that achieve a measure of regret of orderO(g(n)) almost surely asn→ ∞, bound from above and below. Additionally, almost sure upper and lower bounds on the remainder term are established. In the constructions herein, the functiongeffectively controls the “exploration” of the classical “exploration/exploitation” tradeoff.
more » « less
Full Text Available
Reinforcement learning: a comparison of UCB versus alternative adaptive policies

https://doi.org/10.1515/9783110663075-006

Cowan, Wesley; Katehakis, Michael N; Pirutinsky, Daniel (March 2020, De Gruyter)

Full Text Available

Search for: All records